skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Lu, Wei_D"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters. 
    more » « less
  2. A memristor array has emerged as a potential computing hardware for artificial intelligence (AI). It has an inherent memory effect that allows information storage in the form of easily programmable electrical conductance, making it suitable for efficient data processing without shuttling of data between the processor and memory. To realize its full potential for AI applications, fine-tuning of internal device dynamics is required to implement a network system that employs dynamic functions. Here, we provide a perspective on multicationic entropy-stabilized oxides as a widely tunable materials system for memristor applications. We highlight the potential for efficient data processing in machine learning tasks enabled by the implementation of “task specific” neural networks that derive from this material tunability. 
    more » « less
  3. Neuromorphic computing systems promise high energy efficiency and low latency. In particular, when integrated with neuromorphic sensors, they can be used to produce intelligent systems for a broad range of applications. An event‐based camera is such a neuromorphic sensor, inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks (SNNs) that are expensive to train. In this work, a neural network architecture is proposed, reservoir nodes‐enabled neuromorphic vision sensing network (RN‐Net), based on dynamic temporal encoding by on‐sensor reservoirs and simple deep neural network (DNN) blocks. The reservoir nodes enable efficient temporal processing of asynchronous events by leveraging the native dynamics of the node devices, while the DNN blocks enable spatial feature processing. Combining these blocks in a hierarchical structure, the RN‐Net offers efficient processing for both local and global spatiotemporal features. RN‐Net executes dynamic vision tasks created by event‐based cameras at the highest accuracy reported to date at one order of magnitude smaller network size. The use of simple DNN and standard backpropagation‐based training rules further reduces implementation and training costs. 
    more » « less
  4. Abstract The constant drive to achieve higher performance in deep neural networks (DNNs) has led to the proliferation of very large models. Model training, however, requires intensive computation time and energy. Memristor‐based compute‐in‐memory (CIM) modules can perform vector‐matrix multiplication (VMM) in place and in parallel, and have shown great promises in DNN inference applications. However, CIM‐based model training faces challenges due to non‐linear weight updates, device variations, and low‐precision. In this work, a mixed‐precision training scheme is experimentally implemented to mitigate these effects using a bulk‐switching memristor‐based CIM module. Low‐precision CIM modules are used to accelerate the expensive VMM operations, with high‐precision weight updates accumulated in digital units. Memristor devices are only changed when the accumulated weight update value exceeds a pre‐defined threshold. The proposed scheme is implemented with a system‐onchip of fully integrated analog CIM modules and digital sub‐systems, showing fast convergence of LeNet training to 97.73%. The efficacy of training larger models is evaluated using realistic hardware parameters and verifies that CIM modules can enable efficient mix‐precision DNN training with accuracy comparable to full‐precision software‐trained models. Additionally, models trained on chip are inherently robust to hardware variations, allowing direct mapping to CIM inference chips without additional re‐training. 
    more » « less
  5. Analog compute‐in‐memory (CIM) systems are promising candidates for deep neural network (DNN) inference acceleration. However, as the use of DNNs expands, protecting user input privacy has become increasingly important. Herein, a potential security vulnerability is identified wherein an adversary can reconstruct the user's private input data from a power side‐channel attack even without knowledge of the stored DNN model. An attack approach using a generative adversarial network is developed to achieve high‐quality data reconstruction from power leakage measurements. The analyses show that the attack methodology is effective in reconstructing user input data from power leakage of the analog CIM accelerator, even at large noise levels and after countermeasures. To demonstrate the efficacy of the proposed approach, an example of CIM inference of U‐Net for brain tumor detection is attacked, and the original magnetic resonance imaging medical images can be successfully reconstructed even at a noise level of 20% standard deviation of the maximum power signal value. This study highlights a potential security vulnerability in emerging analog CIM accelerators and raises awareness of needed safety features to protect user privacy in such systems. 
    more » « less
  6. Abstract Memristive devices have demonstrated rich switching behaviors that closely resemble synaptic functions and provide a building block to construct efficient neuromorphic systems. It is demonstrated that resistive switching effects are controlled not only by the external field, but also by the dynamics of various internal state variables that facilitate the ionic processes. The internal temperature, for example, works as a second‐state variable to regulate the ion motion and provides the internal timing mechanism for the native implementation of timing‐ and rate‐based learning rules such as spike timing dependent plasticity (STDP). In this work, it is shown that the 2nd state‐variable in a Ta2O5‐based memristor, its internal temperature, can be systematically engineered by adjusting the material properties and device structure, leading to tunable STDP characteristics with different time constants. When combined with an artificial post‐synaptic neuron, the 2nd‐order memristor synapses can spontaneously capture the temporal correlation in the input streaming events. 
    more » « less
  7. Network features found in the brain may help implement more efficient and robust neural networks. Spiking neural networks (SNNs) process spikes in the spatiotemporal domain and can offer better energy efficiency than deep neural networks. However, most SNN implementations rely on simple point neurons that neglect the rich neuronal and dendritic dynamics. Herein, a bio‐inspired columnar learning network (CLN) structure that employs feedforward, lateral, and feedback connections to make robust classification with sparse data is proposed. CLN is inspired by the mammalian neocortex, comprising cortical columns each containing multiple minicolumns formed by interacting pyramidal neurons. A column continuously processes spatiotemporal signals from its sensor, while learning spatial and temporal correlations between features in different regions of an object along with the sensor's movement through sensorimotor interaction. CLN can be implemented using memristor crossbars with a local learning rule, spiking timing‐dependent plasticity (STDP), which can be natively obtained in second‐order memristors. CLN allows inputs from multiple sensors to be simultaneously processed by different columns, resulting in higher classification accuracy and better noise tolerance. Analysis of networks implemented on memristor crossbars shows that the system can operate at very low power and high throughput, with high accuracy and robustness to noise. 
    more » « less
  8. Abstract Advances in the understanding of nanoscale ionic processes in solid‐state thin films have led to the rapid development of devices based on coupled ionic–electronic effects. For example, ion‐driven resistive‐switching (RS) devices have been extensively studied for future memory applications due to their excellent performance in terms of switching speed, endurance, retention, and scalability. Recent studies further suggest that RS devices are more than just resistors with tunable resistance; instead, they exhibit rich and complex internal ionic dynamics that equip them with native information‐processing capabilities, particularly in the temporal domain. RS effects induced by the migration of different types of ions, often driven by an electric field, are discussed. It is shown that, by taking advantage of the different state variables controlled by the ionic processes, important synaptic functions can be faithfully implemented in solid‐state devices and networks. Recent efforts on improving the controllability of ionic processes to optimize device performance are also discussed, along with new opportunities for material design and engineering enabled by the ability to control ionic processes at the atomic scale. 
    more » « less
  9. Abstract Rapid advances in the semiconductor industry, driven largely by device scaling, are now approaching fundamental physical limits and face severe power, performance, and cost constraints. Multifunctional materials and devices may lead to a paradigm shift toward new, intelligent, and efficient computing systems, and are being extensively studied. Herein examines how, by controlling the internal ion distribution in a solid‐state film, a material's chemical composition and physical properties can be reversibly reconfigured using an applied electric field, at room temperature and after device fabrication. Reconfigurability is observed in a wide range of materials, including commonly used dielectric films, and has led to the development of new device concepts such as resistive random‐access memory. Physical reconfigurability further allows memory and logic operations to be merged in the same device for efficient in‐memory computing and neuromorphic computing systems. By directly changing the chemical composition of the material, coupled electrical, optical, and magnetic effects can also be obtained. A survey of recent fundamental material and device studies that reveal the dynamic ionic processes is included, along with discussions on systematic modeling efforts, device and material challenges, and future research directions. 
    more » « less
  10. Abstract Memristors have emerged as transformative devices to enable neuromorphic and in‐memory computing, where success requires the identification and development of materials that can overcome challenges in retention and device variability. Here, high‐entropy oxide composed of Zr, Hf, Nb, Ta, Mo, and W oxides is first demonstrated as a switching material for valence change memory. This multielement oxide material provides uniform distribution and higher concentration of oxygen vacancies, limiting the stochastic behavior in resistive switching. (Zr, Hf, Nb, Ta, Mo, W) high‐entropy‐oxide‐based memristors manifest the “cocktail effect,” exhibiting comparable retention with HfO2‐ or Ta2O5‐based memristors while also demonstrating the gradual conductance modulation observed in WO3‐based memristors. The electrical characterization of these high‐entropy‐oxide‐based memristors demonstrates forming‐free operation, low device and cycle variability, gradual conductance modulation, 6‐bit operation, and long retention which are promising for neuromorphic applications. 
    more » « less